-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-53442]Make PrometheusServlet compatible with OpenMetrics #52183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b020853 to
e1e463b
Compare
|
@peter-toth - would you mind help review this as well ? thanks |
|
This PR is very similar to what you have done in apache/spark-kubernetes-operator#298 so I think it would make sense to split the formating logic to smaller parts like But my biggest concern is that apache/spark-kubernetes-operator#298 was a completely new feature, but here you change an already existing output format. Should't we offer a config to keep the old format as well? cc @dongjoon-hyun as you seem to have worked a lot on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needs to be a double? https://github.com/apache/spark-kubernetes-operator/pull/298/files#diff-3281f083d917b4826b386883cf3a96b5eb2eb5d0c4334571ed15dec95358c1adR168 you didn't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for pointing this out!
I was considering that even if this is a Number - it may not be correctly formatted with simple getValue if we are dealing something complex, like BigDecimal, or AtomicLong. etc. By doing a doubleValue we avoid the possible toString gives us a string instead of number.
spark-kubernetes-operator is relatively new and we are sure there's no such gauges - but IMO we need to be taking this into consideration as well. I'll fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it cause any issues if we omit max, min and mean values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the catch!
Though openmetrics consider these as optional, we shall not be ruling them out as they were present currently. I'll add them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Thank you for pinging me, @peter-toth . |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Not only for format-wise, but also for the value-wise, does this PR emit the same set of values of metrics with the Apache Spark 4.0.1?
-
Is there any chance of discontinuation of graphs in the upper layers like
Grafana? -
May I ask how did you verify this, @jiangzho ?
This shall still be compatible witrh Prometheus, while adding more compatibility.
### What changes were proposed in this pull request? This PR makes PrometheusServlet publish snapshot that's compatible with OpenMetrics ### Why are the changes needed? Adopting OpenMetrics ensures our metrics output is forward-compatible with common observability tools, not only Prometheus, but also OpenTelemetry, Datadog and others. ### Does this PR introduce _any_ user-facing change? This shall still be compatible witrh Prometheus, while adding more compatibility. No direct user facing changes otherwise. ### How was this patch tested? CIs with added unit test ### Was this patch authored or co-authored using generative AI tooling? No
e1e463b to
c6b5a5d
Compare
Thanks for pointing that out! Yes, there may be value differences. For example, timer handling - while codahale gives us nanos, it's a Prometheus best practice to use seconds so we added the convertion. Would you suggest a list of possible value changes like this?
Since Grafana relies on Prometheus & does not query our pods directly, we'll need to ensure
For this scope, we printed a sample snapshot (consisted of the metrics added in the unit tests), that looks like And run a validation with Would you suggest adding integration test for this ? |
|
In that case, I'd like to recommend to propose to enlarge your contribution toward a new
In this way, Apache Spark can support |
+1 to the new servlet idea. |
|
Gentle ping, @jiangzho . FYI, Apache Spark |
What changes were proposed in this pull request?
This PR makes PrometheusServlet publish snapshot that's compatible with OpenMetrics
Why are the changes needed?
Adopting OpenMetrics ensures our metrics output is forward-compatible with common observability tools, not only Prometheus, but also OpenTelemetry, Datadog and others.
Does this PR introduce any user-facing change?
This shall still be compatible witrh Prometheus, while adding more compatibility. No direct user facing changes otherwise.
How was this patch tested?
CIs with added unit test
Was this patch authored or co-authored using generative AI tooling?
No